On Thursday (November 3rd) of this past week, Linkscape’s index updated (in record time – just 3 weeks). New link data’s once again available in OpenSiteExplorer, via the SEOmoz API and in the Mozbar. Here are the stats for this latest index update (our 46th index update):
- 43,077,387,028 (43 billion) URLs
- 480,597,551 (480 million) Subdomains
- 105,570,741 (105 million) Root Domains
- 356,255,241,471 (356 billion) Links
- Followed vs. Nofollowed
- 2.18% of all links found were nofollowed
- 58.21% of nofollowed links are internal, 41.79% are external
- Rel Canonical – 10.46% of all pages now employ a rel=canonical tag
- The average page has 77.28 links on it (down .19 from last index)
- 64.86 internal links on average
- 12.42 external links on average
Since August, we’ve been struggling with the particularly devious problem of binary files in the index messing up link counts and showing links that Google + Bing probably are not counting. In September’s crawl, we put a black list on these files and saw a reduction of ~40% in binary files. This time, we’ve made even more progress (though it’s tough to know exactly how much – we’re continuing to evaluate) and you should see a signifcant reduction in these binary files.
In part because of the reduction in these files, processing time for the Linkscape index was reduced, enabling us to produce a much faster index update. However, we’re planning in December to produce a much larger index and thus anticipate processing time to rise back up. On the plus side, this will mean a lot more link data. In 2012, we’re aiming to reach into the 100billion+ URL index size, closer to what we’ve heard Bing + Google keep in their main indices (~120-140 billion URLs).
As always, feedback on the new index is greatly appreciated – if you’re seeing stuff we’ve missed, files we shouldn’t have crawled or metrics that feel wrong, please let us know. Our engineers would love to hear from you.